AITopics | communication budget

Collaborating Authors

communication budget

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Order Optimal One-Shot Distributed Learning

Arsalan Sharifnassab, Saber Salehkaleybar, S. Jamaloddin Golestani

Neural Information Processing SystemsApr-30-2026, 19:48:52 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > Middle East (0.29)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

018b59ce1fd616d874afad0f44ba338d-AuthorFeedback.pdf

Neural Information Processing SystemsApr-30-2026, 19:48:37 GMT

artificial intelligence, log mn, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

Rethinking gradient sparsification as total error minimization

Neural Information Processing SystemsApr-25-2026, 15:45:54 GMT

Gradient compression is a widely-established remedy to tackle the communication bottleneck in distributed training of large deep neural networks (DNNs). Under the error-feedback framework, Top-k sparsification, sometimes with k as little as 0.1% of the gradient size, enables training to the same model quality as the uncompressed case for a similar iteration count. From the optimization perspective, we find that Top-k is the communication-optimal sparsifier given a per-iteration k element budget. We argue that to further the benefits of gradient sparsification, especially for DNNs, a different perspective is necessary -- one that moves from per-iteration optimality to consider optimality for the entire training. We identify that the total error -- the sum of the compression errors for all iterations -- encapsulates sparsification throughout training.

artificial intelligence, compressor, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

A Theoretical Framework for Energy-Aware Gradient Pruning in Federated Learning

Athanasakos, Emmanouil M.

arXiv.org Machine LearningMar-25-2026

Federated Learning (FL) is constrained by the communication and energy limitations of decentralized edge devices. While gradient sparsification via Top-K magnitude pruning effectively reduces the communication payload, it remains inherently energy-agnostic. It assumes all parameter updates incur identical downstream transmission and memory-update costs, ignoring hardware realities. We formalize the pruning process as an energy-constrained projection problem that accounts for the hardware-level disparities between memory-intensive and compute-efficient operations during the post-backpropagation phase. We propose Cost-Weighted Magnitude Pruning (CWMP), a selection rule that prioritizes parameter updates based on their magnitude relative to their physical cost. We demonstrate that CWMP is the optimal greedy solution to this constrained projection and provide a probabilistic analysis of its global energy efficiency. Numerical results on a non-IID CIFAR-10 benchmark show that CWMP consistently establishes a superior performance-energy Pareto frontier compared to the Top-K baseline.

artificial intelligence, machine learning, selection rule, (14 more...)

arXiv.org Machine Learning

2603.22465

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Greece > Attica > Athens (0.04)

Genre: Research Report (0.50)

Industry: Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback

Order Optimal One-Shot Distributed Learning

Arsalan Sharifnassab, Saber Salehkaleybar, S. Jamaloddin Golestani

Neural Information Processing SystemsFeb-11-2026, 07:42:43 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, mre-clog algorithm, server, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

018b59ce1fd616d874afad0f44ba338d-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-11-2026, 07:42:28 GMT

assumption, communication budget, log mn, (16 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

Rethinking gradient sparsification as total error minimization

Neural Information Processing SystemsFeb-8-2026, 10:07:40 GMT

We identify that the total error -- the sum of the compression errors for all iterations -- encapsulates sparsification throughout training.

artificial intelligence, compressor, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Kansas (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

222afbe0d68c61de60374b96f1d86715-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 19:14:58 GMT

estimation, randomness, unbiased estimator, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback

SA-PEF: Step-Ahead Partial Error Feedback for Efficient Federated Learning

Redie, Dawit Kiros, Arablouei, Reza, Werner, Stefan

arXiv.org Machine LearningJan-29-2026

Biased gradient compression with error feedback (EF) reduces communication in federated learning (FL), but under non-IID data, the residual error can decay slowly, causing gradient mismatch and stalled progress in the early rounds. We propose step-ahead partial error feedback (SA-PEF), which integrates step-ahead (SA) correction with partial error feedback (PEF). SA-PEF recovers EF when the step-ahead coefficient α = 0 and step-ahead EF (SAEF) when α = 1. For non-convex objectives and δ-contractive compressors, we establish a second-moment bound and a residual recursion that guarantee convergence to stationar-ity under heterogeneous data and partial client participation. To balance SAEF's rapid warm-up with EF's long-term stability, we select α near its theory-predicted optimum. Experiments across diverse architectures and datasets show that SA-PEF consistently reaches target accuracy faster than EF. Modern large-scale machine learning increasingly relies on distributed computation, where both data and compute are spread across many devices. Federated learning (FL) enables model training in this setting without centralizing raw data, enhancing privacy and scalability under heterogeneous client distributions (McMahan et al., 2017; Kairouz et al., 2021). In each synchronous FL round, the server broadcasts the current global model to a subset of clients. These clients perform several steps of stochastic gradient descent (SGD) on their local data and return updates to the server, which aggregates them to form the next global iterate (Huang et al., 2022; Wang & Ji, 2022; Li et al., 2024). Although FL leverages rich distributed data, it faces two key challenges.

artificial intelligence, compression, machine learning, (17 more...)

arXiv.org Machine Learning

2601.20738

Country:

Europe > Norway (0.14)
Oceania > Australia (0.04)
North America > United States > Virginia (0.04)
Europe > Finland (0.04)

Genre: Research Report > New Finding (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Prediction-space knowledge markets for communication-efficient federated learning on multimedia tasks

Du, Wenzhang

arXiv.org Artificial IntelligenceDec-2-2025

Federated learning (FL) enables collaborative training over distributed multimedia data but suffers acutely from statistical heterogeneity and communication constraints, especially when clients deploy large models. Classic parameter-averaging methods such as FedAvg transmit full model weights and can diverge under nonindependent and identically distributed (non-IID) data. We propose KTA v2, a prediction-space knowledge trading market for FL. Each round, clients locally train on their private data, then share only logits on a small public reference set. The server constructs a client-client similarity graph in prediction space, combines it with reference-set accuracy to form per-client teacher ensembles, and sends back personalized soft targets for a second-stage distillation update. This two-stage procedure can be interpreted as approximate block-coordinate descent on a unified objective with prediction-space regularization. Experiments on FEMNIST, CIFAR-10 and AG News show that, under comparable or much lower communication budgets, KTA v2 consistently outperforms a local-only baseline and strong parameter-based methods (FedAvg, FedProx), and substantially improves over a FedMD-style global teacher. On CIFAR-10 with ResNet-18, KTA v2 reaches 57.7% test accuracy using approximately 1/1100 of FedAvg's communication, while on AG News it attains 89.3% accuracy with approximately 1/300 of FedAvg's traffic.

artificial intelligence, federated learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2512.00841

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback